Spontaneous identification of individual nick name from web

نویسنده

  • Ajanthaa Lakkshmanan
چکیده

A person is generally called by different names, it is difficult to identify a person from the web, person will be called by different names by different people for example, Michael Jackson is called as MJ and some call him ” king of pop” , so there will be not trouble-free in penetrating the names from the web . Accurate identification of name of a given person is useful in various web related tasks such as information extraction, sentiment analysis, personal name disambiguation, and relative pulling out. I recommend a method to extract nick name of a given person name from the web. Given a name, the proposed method first extracts a set of candidate nick names, there after i rank the extracted candidates according to the likelihood of a candidate being a correct nickname of the given name. I propose a system, automatically extracted lexical pattern-based approach to efficiently extract a large set of candidate nick names from snippets retrieved from a web investigate engine. I identify various grade scores to estimate candidate nick name using three approach: 1.lexical pattern frequency, 2. word co-occurrences in an anchor text graph, and 3.page counts on the web. To construct a robust nick name finding system, i incorporate the dissimilar ranking scores into a single ranking function using ranking support vector machines. I assess the planned method on three data sets: an English personal names data set and place names data set and a popular personal names data set. The projected method outperforms numerous baselines and previously proposed name alias extraction methods, achieving a statistically momentous mean reciprocal rank (MRR) of 0.67.Experiments carried out using location names and popular personal names suggest the possibility of extending the proposed method to extract nick name for different types of named entities and for different languages. KeywordsMean Reciprocal Rank (MRR)

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Discovery of Lexical Patterns using Pattern Extraction Algorithm to Identify Personal Name Aliases with Entities

The personal name aliases are extremely significant in information retrieval to retrieve complete information about a personal name from the web, as some of the web pages of the person may also be referred by his or her alias name / nick name / real name. There is a rapid growth in people searching where the personal name aliases are concerned. We proposed a pattern generator which includes aut...

متن کامل

Representing a method to identify and contrast with the fraud which is created by robots for developing websites’ traffic ranking

With the expansion of the Internet and the Web, communication and information gathering between individual has distracted from its traditional form and into web sites. The World Wide Web also offers a great opportunity for businesses to improve their relationship with the client and expand their marketplace in online world. Businesses use a criterion called traffic ranking to determine their si...

متن کامل

Data-Driven Language Understanding for Spoken Language Dialogue∗

We present a natural-language customer service application for a telephone banking call center, developed as part of the AMITIES dialogue project (Automated Multilingual Interaction with Information and Services). Our dialogue system, based on empirical data gathered from real call-center conversations, features data-driven techniques that allow for spoken language understanding despite speech ...

متن کامل

P-11: Varicocele Improved Semen Quality and Chromatin Integrity in Normozoospermic and Non-Normozoospermic Varicocele Individual

Background: Varicocele is related with poor semen quality and sperm chromatin integrity which possibility decreases the spontaneous pregnancy potential. Some of the varicocele individuals have normal semen parameters with failed pregnancy. It is also one of the most controversial issues in the field ofinfertility, especially regarding why, when and for whom varicocelectomy should be implemented...

متن کامل

Identification and Classification of Desirable Web-Based Services from the Perspective of Website Users of Iran’s Hospitals Based on Kano Model of Customer Satisfaction

Background and Aim: A hospital website is an appropriate system for exchanging information and connecting patients, hospitals and medical staff. The purpose of this study was to identify and classify desirable web-based services in websites of Iran's hospitals based on Kano’s Customer Satisfaction Model. Materials and Methods: This was a survey study. The statistical population of the study co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013